Supporting user-subjective categorization with self-organizing maps and learning vector quantization
نویسندگان
چکیده
we requested the user to reclassify documents that were misclassified by the system. Results show that despite the subjective nature of human categorization, automatic document categorization methods correlate well with subjective, personal categorization, and the LVQ method outperforms the SOM. The reclassification process revealed an interesting pattern: About 40% of the documents were classified according to their original categorization, about 35% according to the system's categorization (the users changed the original catego-rization), and the remainder received a different (new) categorization. Based on these results we conclude that automatic support for subjective categorization is feasible ; however, an exact match is probably impossible due to the users' changing categorization behavior. Introduction Clustering is defined as unsupervised classification of patterns into groups (Jain, Murty, & Flynn, 1999). A wide variety of clustering methods, among them clustering based on artificial neural networks (ANN), have been applied in a variety of domains, such as image segmentation, object and character recognition, data mining, and information retrieval and document categorization (Jain et al., 1999). Text categorization is the assignment of documents to one or more predefined categories. Users generally define subjective categories based on personal preferences and assign documents they read or write to categories according to this subjective definition. Obviously, the categories employed by information users are idiosyncratic to the specific user. A major problem in generating an automatic, adaptable system for text categorization is how to determine to what extent an automatic system can reflect the subjective user's viewpoint regarding his/her domain of interest. Users are generally inconsistent about their classifications (Dawes, 1979) and have a tendency to change the document classification they use over time. In addition, they may find a document relevant to more than one category but usually choose just one category to host the document. It has been found that automated text categorization can be performed with high accuracy (90% or higher) in the cases of standard collections, such as Reuters (Dumais, Platt, Heckerman, & Sahami, 1998). Our question is how an automatic system can guess the user's subjective classification in personal texts collections. This is a growing challenge in today's networked environment, where users are flooded with information daily. This trend is demonstrated by the work of Roussinov and Zhao (2003), which clusters together messages generated in computer-mediated meetings in order to help users to cope with information overload in these meetings. The fact that there is a correlation …
منابع مشابه
Can Automatic Personal Categorization deal with User Inconsistency?
Document categorization is a daily task in every organization, but it is a very subjective process. While automatic document categorization has been widely studied, much challenging research still remains to support user subjective categorization. This study evaluates and compares the application of Self-Organizing Maps (SOM) and Learning Vector Quantization (LVQ) to automatic document classifi...
متن کاملCan Automatic Personal Categorization deal with User Inconsistency?
Document categorization is a daily task in every organization, but it is a very subjective process. While automatic document categorization has been widely studied, much challenging research still remains, to support user subjective categorization. This study evaluates and compares the application of Self-Organizing Maps (SOM) and Learning Vector Quantization (LVQ) to automatic document classif...
متن کاملAutomating Personal Categorization Using Artificial Neural Networks
Organizations as well as personal users invest a great deal of time in assigning documents they read or write to categories. Automatic document classification that matches user subjective classification is widely used, but much challenging research still remain to be done. The self-organizing map (SOM) is an artificial neural network (ANN) that is mathematically characterized by transforming hi...
متن کاملEM Algorithms for Self-Organizing Maps
Self-organizing maps are popular algorithms for unsupervised learning and data visualization. Exploiting the link between vector quantization and mixture modeling, we derive EM algorithms for self-organizing maps with and without missing values. We compare self-organizing maps with the elastic-net approach and explain why the former is better suited for the visualization of high-dimensional dat...
متن کاملAir Quality Modelling by Kohonen’s Self-organizing Feature Maps and LVQ Neural Networks
The paper presents a design of parameters for air quality modelling and the classification of districts into classes according to their pollution. Further, it presents a model design, data pre-processing, the designs of various structures of Kohonen’s Self-organizing Feature Maps (unsupervised methods), the clustering by K-means algorithm and the classification by Learning Vector Quantization n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JASIST
دوره 56 شماره
صفحات -
تاریخ انتشار 2005